Open
Conversation
… Here's a summary of the key architectural decisions: --- ### Design Overview: `ask-claude` CLI **Runtime:** Python 3.14+ — matches the existing project. TypeScript rejected (would add a second runtime to a Python-only codebase). **SDK Entry Point:** `claude_agent_sdk.query()` async iterator — streams messages incrementally, manages the full agentic loop (tool execution, retries, context) internally. `ClaudeSDKClient` deferred to a future version when session resume is needed. **CLI Interface (Click):** - Prompt via positional arg, `--prompt` flag, or stdin pipe - Flags for `--model`, `--tools`, `--system-prompt`, `--permission-mode`, `--working-dir`, `--verbose`, `--json` - One new file: `src/silly_scripts/cli/ask_claude.py` — follows the project's established one-module-per-command pattern **Streaming Strategy:** - **Default:** Assistant text → stdout (real-time), tool-call summaries → stderr - **`--json`:** Full NDJSON to stdout for machine consumption - stdout/stderr separation is a deliberate public contract enabling `answer=$(ask-claude "...")` **Error Handling:** Six distinct exit codes (0/1/2/3/4/5/99/130) following POSIX conventions. No stack traces by default; `--verbose` enables them. API key validated before SDK call, never accepted as a flag (security). **Configuration:** Defaults sourced from pydantic-settings (`Settings` class) with precedence: CLI flag > env var > hardcoded default. **Key Invariant:** `ask_claude.py` is the only module that imports `claude_agent_sdk` — no SDK leakage into other parts of the codebase. Sources: - [Claude Agent SDK Overview](https://platform.claude.com/docs/en/agent-sdk/overview) - [Claude Agent SDK Quickstart](https://platform.claude.com/docs/en/agent-sdk/quickstart)
## Summary ### Files modified: 1. **`pyproject.toml`** — Added `claude-agent-sdk` dependency and `ask-claude` entry point 2. **`src/silly_scripts/settings.py`** — Added 3 new settings fields: `anthropic_api_key`, `claude_default_model`, `claude_default_tools` 3. **`env.template`** — Added Claude-related env var documentation ### Files created: 4. **`src/silly_scripts/cli/ask_claude.py`** — Full Click command scaffold with all flags from the ADR (`--model`, `--tools`, `--system-prompt`, `--permission-mode`, `--working-dir`, `--verbose`, `--json`), prompt resolution from positional arg / `--prompt` flag / stdin pipe, and helper functions (`_resolve_prompt`, `_parse_tools`) 5. **`tests/cli/test_ask_claude.py`** — 17 tests covering prompt resolution (6 tests), tool parsing (4 tests), and CLI integration (7 tests) 6. **`tests/test_settings.py`** — 2 new tests for the Claude settings defaults and env overrides ### Acceptance criteria verified: - ✅ Project skeleton exists with the defined structure (`ask_claude.py` under `src/silly_scripts/cli/`) - ✅ `claude-agent-sdk` is listed as a dependency and installed (v0.1.48) - ✅ `ask-claude --help` prints usage without crashing (exit 0) - ✅ `ask-claude` with no args prints a user-friendly error (exit 2) - ✅ All 101 new + existing tests pass (2 pre-existing failures in `test_speech_to_text.py` are unrelated) - ✅ Lint and format checks pass clean
## Summary ### Changes to `src/silly_scripts/cli/ask_claude.py` Replaced the "Not yet implemented" stub with full Claude Agent SDK integration: 1. **API key validation** — Checks `settings.anthropic_api_key` before attempting a query; produces a clear `ClickException` with instructions if missing (non-zero exit code). 2. **SDK `query()` integration** — Uses the `claude_agent_sdk.query()` async function to send the prompt and stream responses. The async coroutine is run via `asyncio.run()`. 3. **Real-time streaming output** — `_run_query()` iterates over messages from the SDK: - `AssistantMessage` with `TextBlock` → printed to stdout immediately - `ToolUseBlock` → printed to stderr only when `--verbose` is set - `SystemMessage` → printed to stderr only when `--verbose` is set - `ResultMessage` → logs completion stats; raises error if `is_error=True` 4. **JSON/NDJSON mode** — `--json` flag serializes each message as a single JSON line to stdout via `dataclasses.asdict()`. 5. **Error handling** — `CLINotFoundError`, `ProcessError`, and `ClaudeSDKError` are caught and converted to `click.ClickException` with meaningful messages and non-zero exit codes. 6. **Options forwarding** — All CLI flags (`--model`, `--tools`, `--system-prompt`, `--permission-mode`, `--working-dir`) are correctly forwarded to `ClaudeAgentOptions`. ### Changes to `tests/cli/test_ask_claude.py` Expanded from 17 tests to **27 tests** covering: - All original prompt resolution and tool parsing tests (preserved) - **New `TestPrintAssistant`** (3 tests): text block output, tool-use hiding, verbose tool-use display - **New `TestPrintJson`** (2 tests): dataclass serialization, non-dataclass fallback - **New `TestRunQuery`** (3 tests): streaming text output, error result handling, NDJSON mode - **Updated `TestCli`** (9 tests): help flag, no-args error, missing API key error, successful query (exit 0), SDK error handling, CLI-not-found handling, invalid permission mode, stdin piping, model flag forwarding All SDK calls are properly mocked — no real network calls in tests. ### Acceptance Criteria Verification - ✅ CLI accepts a prompt as a positional argument - ✅ All Claude output is displayed in real time as it is generated (streamed via async iterator) - ✅ Missing API key produces a clear error message and non-zero exit code - ✅ Empty or missing prompt produces a usage error and non-zero exit code - ✅ Successful execution exits with code 0 - ✅ All 111 tests pass (2 pre-existing failures in `test_speech_to_text.py` are unrelated) - ✅ Lint and format checks pass clean
…t me compile my findings.
---
# Code Review: `ask-claude` CLI Implementation (task-003)
**Reviewer:** 🔍 code-reviewer
**Verdict:** ✅ **APPROVE** — with non-blocking suggestions
The implementation is well-crafted, follows project conventions closely, and faithfully implements the ADR-001 design. No blocking issues found. Below are categorized findings.
---
## ✅ Security: No Hardcoded Secrets
- **No API keys in source code.** Confirmed via grep — no hardcoded keys or secrets anywhere in `ask_claude.py`.
- **API key sourced correctly** from `settings.anthropic_api_key` (pydantic-settings → env var / `.env` file). No `--api-key` CLI flag, matching the design's explicit security decision.
- **Test file uses `"sk-test-key"`** in `patch.dict("os.environ", ...)` — this is appropriate for test mocking and never touches a real API.
**Pass.** No issues.
---
## ✅ Design Adherence (ADR-001)
| Design Requirement | Implementation | Status |
|---|---|---|
| Click command with `main()` entry point | ✅ `@click.command() def main(...)` | ✔ |
| `pyproject.toml` entry point | ✅ `ask-claude = "silly_scripts.cli.ask_claude:main"` | ✔ |
| Prompt resolution: positional → flag → stdin | ✅ `_resolve_prompt()` | ✔ |
| `query()` async iterator (not `ClaudeSDKClient`) | ✅ `async for message in query(...)` | ✔ |
| API key validated before SDK call | ✅ Lines 165-171 | ✔ |
| Text → stdout, tool calls → stderr | ✅ `click.echo()` / `click.echo(..., err=True)` | ✔ |
| `--json` NDJSON mode | ✅ `_print_json()` | ✔ |
| Settings via pydantic-settings | ✅ `get_settings()` for model/tools defaults | ✔ |
| Module structure matches design | ✅ Single module, correct function decomposition | ✔ |
**Pass.** Implementation matches the design faithfully.
---
## ⚠️ Non-Blocking Findings
### 1. **[suggestion] Incomplete error categorization vs. design spec** — Error Handling
The ADR specifies **7 distinct error categories** with specific exit codes (2–5, 99, 130). The implementation catches only 3 SDK exception types (`CLINotFoundError`, `ProcessError`, `ClaudeSDKError`) and maps all of them to `click.ClickException` (exit code 1).
**Missing from the implementation:**
- **SDK authentication failure** (exit code 3 per design)
- **Rate limit / quota** (exit code 4 per design)
- **Network / transient error** (exit code 5 per design)
- **Unexpected exception** catch-all with exit code 99
- **KeyboardInterrupt** handler (exit code 130)
**Why it matters:** The design specifies these as distinct categories for scripting consumers who may key on exit codes. Currently, all errors exit with code 1.
**Recommendation:** This is acceptable for v1 if the SDK doesn't expose distinct exception types for auth/rate-limit/network errors yet. If it does, these should be added. At minimum, a `KeyboardInterrupt` handler producing a clean exit (`\nInterrupted.`) would be a nice polish. Consider adding a comment noting the gap: `# TODO: Differentiate auth/rate-limit/network errors when SDK exposes specific exception types`.
**Severity:** Non-blocking (the current behavior is correct — errors are caught and reported — just not as granular as the design specified).
---
### 2. **[suggestion] Default mode tool-call output differs from design** — Streaming/Output
The ADR Section 5 states:
> **Default mode:** Tool invocations are printed as single summary lines to **stderr** (so they can be suppressed or redirected independently).
But the implementation **hides tool calls entirely** in non-verbose mode (line 245: `elif isinstance(block, ToolUseBlock) and verbose:`). Tool calls are only shown with `--verbose`.
**Why it matters:** The design intended a middle ground — short tool summaries always on stderr (e.g., `[tool] Read: src/auth.py`), with full inputs only in verbose mode. The implementation made a reasonable UX call (less noise), but it deviates from the design's explicit output contract.
**Recommendation:** This is arguably a better UX decision, but the ADR should be updated to match the implementation, or the implementation should show short summaries in default mode. A quick `# NOTE: Deviates from ADR-001 — tool calls hidden by default for cleaner output` comment would suffice.
**Severity:** Non-blocking.
---
### 3. **[suggestion] `model` not passed to `ClaudeAgentOptions` constructor** — Correctness (Minor)
Line 173: `options = ClaudeAgentOptions(model=resolved_model, ...)` — this looks correct. However, `system_prompt` is set as a post-construction attribute assignment (line 180: `options.system_prompt = system_prompt`) rather than in the constructor. This is inconsistent — if the `ClaudeAgentOptions` dataclass/model validates fields at construction time, post-assignment might bypass validation.
**Recommendation:** Pass `system_prompt` in the constructor if the SDK supports it, for consistency:
```python
options = ClaudeAgentOptions(
model=resolved_model,
allowed_tools=resolved_tools,
permission_mode=permission_mode,
cwd=str(working_dir) if working_dir else None,
system_prompt=system_prompt, # None is acceptable
)
```
**Severity:** Non-blocking. Current code works; this is a cleanliness suggestion.
---
### 4. **[suggestion] f-string in `logger.debug()` calls** — Performance
Lines 157-163 use f-string interpolation in `logger.debug()` calls. Per Python logging best practices, lazy formatting with `%s` is preferred because f-strings are evaluated even when the log level is above DEBUG:
```python
# Current
logger.debug(f"Prompt: {resolved_prompt}")
# Recommended
logger.debug("Prompt: %s", resolved_prompt)
```
**Severity:** Non-blocking. The project's CLAUDE.md explicitly states "Uses f-string log messages" as a convention, so this is consistent with project standards. Noting it for awareness only — **do not change**.
---
## ✅ Readability & Conventions
- **Module structure** follows the documented order: docstring → imports → logger → helpers → `@click.command()` main → `__main__` guard ✔
- **Import organization**: stdlib → third-party → local ✔
- **Type hints** on all function signatures and return types ✔
- **Google-style docstrings** with `Args:`, `Returns:`, `Raises:` ✔
- **`click.echo()` used** (not `print()`) — respects `T20` lint rule ✔
- **Naming** is clear and descriptive throughout ✔
- **`# pragma: no cover`** on `__main__` guard ✔
**Pass.** Clean, idiomatic code.
---
## ✅ Test Adequacy
**27 tests** with strong coverage:
- ✅ Prompt resolution (6 tests covering positional, flag, stdin, empty, precedence)
- ✅ Tool parsing (4 tests covering comma-sep, whitespace, empty entries, single)
- ✅ Message printing (3 tests: text blocks, hidden tool-use, verbose tool-use)
- ✅ JSON serialization (2 tests: dataclass, fallback)
- ✅ Async streaming (3 tests: text output, error result, NDJSON mode)
- ✅ CLI integration (9 tests: help, no-args, missing key, success, SDK error, CLI-not-found, invalid permission, stdin pipe, model forwarding)
- ✅ All SDK calls properly mocked — no real network calls
- ✅ `autouse` fixture for `get_settings.cache_clear()` — follows project conventions
- ✅ Test helper factories (`_make_success_messages`, `_make_result_only`) reduce duplication
**One minor observation:** The `ProcessError` exception path is not tested (only `ClaudeSDKError` and `CLINotFoundError` have CLI integration tests). Consider adding a test for `ProcessError` handling.
**Pass.** Test coverage is solid.
---
## Summary
| Category | Verdict |
|---|---|
| Security (no hardcoded secrets) | ✅ Pass |
| Design adherence | ✅ Pass (2 minor deviations noted) |
| Error handling | ✅ Pass (functional, less granular than design spec) |
| Readability & conventions | ✅ Pass |
| Test coverage | ✅ Pass |
**Overall: APPROVE.** The implementation is clean, secure, well-tested, and faithfully follows the architectural design. The non-blocking suggestions above are improvements for consideration, not merge blockers.
…hanges made: ## Changes Applied ### `src/silly_scripts/cli/ask_claude.py` 1. **KeyboardInterrupt handler** (Finding #1): Added a `KeyboardInterrupt` catch that prints `"\nInterrupted."` to stderr and exits with code 130 (POSIX convention for SIGINT). 2. **TODO comment for granular exit codes** (Finding #1): Added a TODO on the `ClaudeSDKError` catch noting that auth/rate-limit/network errors should be differentiated when the SDK exposes specific exception types. 3. **ADR deviation note** (Finding #2): Added a `# NOTE` comment documenting the intentional deviation from ADR-001 where tool calls are hidden by default for cleaner output. 4. **Constructor consistency** (Finding #3): Moved `system_prompt` into the `ClaudeAgentOptions` constructor call instead of post-construction assignment, for consistency and proper validation. 5. **Finding #4 (f-string logging)**: Not changed — reviewer explicitly said "do not change" as it matches the project's `CLAUDE.md` convention. ### `tests/cli/test_ask_claude.py` 6. **ProcessError test**: Added `test_process_error_exits_nonzero` to cover the previously untested `ProcessError` exception path. 7. **KeyboardInterrupt test**: Added `test_keyboard_interrupt_exits_130` to verify the new handler produces exit code 130. **All 29 tests pass.** ✅
All **69 tests** in `tests/cli/test_ask_claude.py` pass (up from 29 previously). The 2 failures in the full suite are pre-existing `test_speech_to_text.py` issues unrelated to our changes. ### What was added (40 new tests) **Unit tests — Prompt resolution edge cases** (`TestResolvePromptEdgeCases`, 6 tests): - Empty string positional falls through to flag - Empty string for both positional and flag raises `UsageError` - Whitespace-only positional is accepted (truthy) - Very long prompt (100k chars) accepted without truncation - Newline-containing prompts preserved - Unicode prompts preserved **Unit tests — Tool parsing edge cases** (`TestParseToolsEdgeCases`, 3 tests): - Empty string → empty list - Commas-only → empty list - Whitespace-only entries → empty list **Unit tests — Print assistant edge cases** (`TestPrintAssistantEdgeCases`, 4 tests): - Multiple text blocks printed in order - Mixed text + tool blocks (tool hidden when not verbose) - Empty content list → no output - Empty text block → newline only **Unit tests — Print JSON edge cases** (`TestPrintJsonEdgeCases`, 2 tests): - `ResultMessage` with `total_cost_usd` serializes correctly - `AssistantMessage` with `ToolUseBlock` serializes correctly **Async unit tests — _run_query edge cases** (`TestRunQueryEdgeCases`, 6 tests): - Multiple assistant messages stream incrementally - `SystemMessage` hidden when not verbose - `SystemMessage` shown on stderr when verbose - Error result includes session ID in exception message - JSON mode serializes system messages as NDJSON - Very long prompt forwarded to SDK without truncation **Integration tests — Options forwarding** (`TestCliOptionsForwarding`, 10 tests): - `--tools` flag parsed and forwarded to `allowed_tools` - `--system-prompt` forwarded - `--permission-mode` forwarded - `--working-dir` forwarded to `cwd` - `-p` short form accepted - `-m` short form accepted - `-t` short form accepted - Default model from settings - Default tools from settings - Prompt text forwarded verbatim **Integration tests — Output behavior** (`TestCliOutputBehavior`, 5 tests): - Streamed text from multiple messages appears in stdout - `--json` flag produces valid NDJSON via CLI - `--verbose` shows tool calls in output - `ResultMessage` with `is_error=True` exits non-zero via CLI - Very long prompt (50k chars) accepted via CLI **Integration tests — Exit codes** (`TestExitCodes`, 4 tests): - Success → exit code 0 - Usage error (invalid option) → exit code 2 - `ClickException` (missing API key) → exit code 1 - `KeyboardInterrupt` → exit code 130 ### Acceptance Criteria Verification | Criterion | Status | |---|---| | Unit tests cover argument parsing (valid prompt, missing prompt, help flag) | ✅ | | Unit tests cover error paths (missing API key, SDK errors) | ✅ | | Integration test confirms CLI produces streamed output to stdout | ✅ | | Integration test confirms correct exit codes for success and failure | ✅ | | Edge cases: empty prompt, very long prompt | ✅ | | All tests pass in CI-compatible execution | ✅ (69/69 pass) |
Here's what was done: ### README.md — Complete rewrite with: - **Installation** section: Prerequisites (Python 3.14+, uv), clone + `make init`, and manual alternative - **Configuration** section: `.env` setup with `ANTHROPIC_API_KEY`, configuration table for all Claude-related env vars, link to `env.template` - **Usage** section: Full `--help` output reproduced, plus **5 usage examples**: 1. Simple question (positional prompt) 2. Piped stdin input 3. Model selection with verbose output 4. Custom tools with permission mode and working directory 5. NDJSON output piped to `jq` - **Other scripts** table listing all CLI commands - **Development** section with `make` targets ### `--help` flag The CLI already had a clear `--help` with examples embedded in the docstring (Click's `\b` block). It matches the README documentation exactly. No changes needed to the CLI itself. ### Test update Added assertions to `test_help_flag` to verify `--system-prompt`, `--permission-mode`, `--working-dir`, `Examples:`, and `ask-claude` are present in `--help` output, ensuring the help text stays in sync with the README. ### Results - Lint: all checks passed - Tests: 69/69 ask-claude tests pass; 2 pre-existing failures in `test_speech_to_text.py` (unrelated Deepgram issue)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Create a cli command which wraps Claude Agend SDK. it should receive prompt as parameter and run it with claude. howing all output of claude